Including Dialects and Language Varieties in Author Profiling

نویسندگان

  • Alina Maria Ciobanu
  • Marcos Zampieri
  • Shervin Malmasi
  • Liviu P. Dinu
چکیده

This paper presents a computational approach to author profiling taking gender and language variety into account. We apply an ensemble system with the output of multiple linear SVM classifiers trained on character and word ngrams. We evaluate the system using the dataset provided by the organizers of the 2017 PAN lab on author profiling. Our approach achieved 75% average accuracy on gender identification on tweets written in four languages and 97% accuracy on language variety identification for Portuguese.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Linguistic Audit as a Professional Activity

The subject of this research is linguistic (or: language) audit. The term is new and not being widely used so far. Linguistic audit, in particular, is offered as a service of linguistic-consulting agencies’ activities. Modern linguistic consulting, according to the author, is a form of stimulating theoretical and practical development of linguistic ecology, a new branch of applied linguistics, ...

متن کامل

The Short Vowels /i/ and /u/ in Iranian Balochi Dialects

The aim of the present paper is to study the status of the short vowels /i/ and /u/ in five selected Iranian Balochi dialects. These dialects are spoken in Sistan (SI), Saravan (SA), Khash (KH), Iranshahr (IR), and Chabahar (CH) regions located in province Sistan va Baluchestan in the southeast of Iran. This study investigates whether these two vowels have the same qualities as the short /i/ an...

متن کامل

A Document Weighted Approach for Gender and Age Prediction Based on Term Weight Measure

Author profiling is a text classification technique, which is used to predict the profiles of unknown text by analyzing their writing styles. Author profiles are the characteristics of the authors like gender, age, nativity language, country and educational background. The existing approaches for Author Profiling suffered from problems like high dimensionality of features and fail to capture th...

متن کامل

A Historical and Comparative Study of the Ergative Verb Structure in Ardakani, Dashti, Dashtaki, Yazdi Jewish and Lari Dialects

Introduction        A dialect is a variety of a language used by group of people whose lexicon, syntax, phonetics and phonology are different from those of other people. The existence of many geographical, economic and social barriers among the speakers of a language cause the emergence of many dialects. As such, each language has many dialects and accents and each dialect has many different ac...

متن کامل

Sentence-level dialects identification in the greater China region

Identifying the different varieties of the same language is more challenging than unrelated languages identification. In this paper, we propose an approach to discriminate language varieties or dialects of Mandarin Chinese for the Mainland China, Hong Kong, Taiwan, Macao, Malaysia and Singapore, a.k.a., the Greater China Region (GCR). When applied to the dialects identification of the GCR, we f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1707.00621  شماره 

صفحات  -

تاریخ انتشار 2017